Skip to main content

All Questions

2votes
2answers
301views

Advantage computed the wrong way?

Here is the code written by Maxim Lapan. I am reading his book (Deep Reinforcement Learning Hands-on). I have seen a line in his code which is really weird. In the accumulation of the policy gradient $...
jgauth's user avatar

close